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Abstract. Ge and Stefankovic have recently introduced a novel two- variable 
graph polynomial. When specialised to a bipartite graphs G and evaluated at 
the point (^,1) this polynomial gives the number of independent sets in the 
graph. Inspired by this polynomial, they also introduced a Markov chain which, 
if rapidly mixing, would provide an efficient sampling procedure for independent 
sets in G. This sampling procedure in turn would imply the existence of efficient 
approximation algorithms for a number of significant counting problems whose 
complexity is so far unresolved. The proposed Markov chain is promising, in 
the sense that it overcomes the most obvious barrier to mixing. However, we 
show here, by exhibiting a sequence of counterexamples, that the mixing time 
of their Markov chain is exponential in the size of the input when the input is 
chosen from a particular infinite family of bipartite graphs. 

I. Overview 
Consider the following basic computational problem: 
Name: #BIS. 

Instance: A bipartite graph G. 

Output: The number of independent sets in G. 

It has long been know that #BIS is #P-complete and hence presumably in- 
tractable, if we insist on an exact solution. However, the computational complexity 
of approximating #BIS remains a fascinating open problem. The standard notion 
of efficient approximation in the context of counting problems is the "fully poly- 
nomial approximation scheme" or FPRAS. Roughly speaking, an FPRAS is a 
polynomial-time randomised algorithm that produces an estimate that is close in 
relative error to the true solution with high probability. (See [H Defn 11.2] for 
a precise definition.) The most satisfactory situation would be either to have an 
FPRAS for #BIS, or a proof that #BIS is NP-hard to approximate. However, 
neither of situations is known to occur. 

Dyer et al. [3] noted that a number of counting problems are equivalent to 
^ BIS under approximation-preserving reducibility, and further #BIS-equivalent 
problems have been presented in subsequent work [HE]. Since no FPRAS has been 
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found for any of the counting problems in this equivalence class, it is becoming 
standard to progress under the assumption that #BIS (and hence each of the 
#BIS equivalent problems) does not admit an FPRAS. So finding an FPRAS 
for #BIS at this stage would be a significant development. Not only would it 
imply the existence of an FPRAS for several natural counting problems — such 
as counting downsets in a partial order, or evaluating the partition function of 
the ferromagnetic Ising model with local fields — but it would also resolve the 
complexity of approximating #BIS in the opposite direction to the one many people 
expect. 

The most fruitful approach to designing efficient approximation algorithms for 
counting problems has been Markov chain Monte Carlo (MCMC). A direct ap- 
plication of MCMC to #BIS would work as follows. Given a bipartite graph G 
with n vertices, consider the Markov chain whose state space, Q, is the set of all 
independent sets in G, and whose transition probabilities P{-,-) are as follows, 
where © denotes symmetric difference and H(I) = {I' e Q \ \I © I'\ — 1}: 



It is easy to check that this Markov chain has the uniform distribution on indepen- 
dent sets as its unique stationary distribution. So, simulating the Markov chain 
for sufficiently many steps would enable us to sample independent sets nearly uni- 
formly. From there it is a short step to estimating the number of independent 
sets jSl §3.2]. 

To obtain an FPRAS from this approach, one requires that the Markov chain on 
independent sets is rapidly mixing, i.e., that it is close to the stationary distribu- 
tion after a number of steps that is polynomial in n. Unfortunately, it is clear that 
the proposed Markov chain does not have this property. Consider the complete 
bipartite graph with equal numbers of vertices in the left and right blocks of the 
bipartition. There are 2 n / 2 — 1 independent sets that have non-empty intersection 
with the left block, and the same number with non-empty intersection with the 
right. Any sequence of transitions which starts in a left-oriented independent set 
and ends in a right-oriented one must necessarily pass through the empty indepen- 
dent set. Informally, the empty independent set presents an obstruction to rapid 
mixing by forming a constriction in the state space. This intuition can be made 
rigorous by noting that the "conductance" of the Markov chain is exponentially 
small, which implies exponential (in n) mixing time j2j Claim 2.3]. In fact, it is 
not even necessary to have a dense graph in order to obtain such a constriction: 
degree 6 will do [2, Thm 2.1]. 

Ge and Stefankovic 0] have recently introduced an intriguing graph polynomial 
R' 2 (G; A, /i), in two indeterminates A and /i, that is associated with a bipartite 
graph G. At the point (A,/i) = (|, 1) it counts independent set in G; specifically, 




l-H{I)/2n, ifl' = 7; 
l/2n, if V e H(I); 



otherwise. 
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the number of independent sets in G is given by 2 n ~ m R' 2 (G; |, 1), where n is the 
number of vertices, and m the number of edges in G [H Thm 4] . This polynomial 
inspires them to propose a new Markov chain [4J Defn 6] that potentially could 
be used to sample independent sets from a bipartite graph and hence provide an 
approximation algorithms for ^ BIS. The Markov chain, which is described below, 
is very different from the one discussed earlier. In particular, its states are subsets 
of the edge set of G rather than subsets of the vertex set. Thus, sampling an 
independent set of G is a two-stage procedure: (a) sample an edge subset A of G 
from the appropriate distribution, and then (b) sample an independent set from a 
distribution conditioned on A. Details will be given below. 

The encouraging aspect of this new Markov chain, which we call the Ge-Stefankovic 
Process, or GS Process for short, is that it is immune to the obvious counterexam- 
ples, such as the complete bipartite graph. Unfortunately, with a certain amount 
of effort it is possible to find a counterexample to rapid mixing. In the following 
section we describe the GS Process and construct a sequence of graphs on which 
its mixing time is exponential (in the number of vertices of the graph). Although 
this counterexample rules out their Markov chain as an approach to constructing a 
general FPRAS for #BIS, we may still hope that it provides an efficient algorithm 
for some restricted class of graphs. For example, [4, Theorem 7] shows that it 
provides an efficient algorithm on trees. 

2. The Ge-Stefankovic Process 

Before stating our result, we need to formalise what we mean by mixing, rapid or 
otherwise. Let (X t ) be an ergodic Markov chain with state space i?, distribution p t 
at time t, and unique stationary distribution 7T. Let xq £ i? be the initial state of 
the chain, so that po assigns unit mass to state xq. Define the mixing time t(xq) 
with initial state xq £ tl, as the first time t at which \\\pt — tt||i < i.e., at 
which the distance between the i-step and stationary distributions as at most e _1 
in total variation; then define the mixing time r as the maximum of t(xq) over all 
choices of initial state xq. 

Suppose G = (UU V, E) is a bipartite graph, where U, V are disjoint sets forming 
the vertex bipartition, and E is the edge set. We are interested in two probability 
spaces, (H,7Tq) and (E, its), where Q = 2 E and E = 2 U . We construct the 
probability distributions ttq : Q — > [0, 1] and n s : E — > [0, 1] with the help of a 
certain consistency relation x on Z 1 x i7, which is defined as follows. For a pair 
(J, A) £ E x SI, consider the subgraph of {U U V, A) induced by the vertex set 
I U V. We say that the relation A) holds iff every vertex of V has even degree 
in this subgraph. Start with the probability space of consistent pairs {(I, A) £ 
E x Q | x(I,A)} with the uniform distribution. Then ttq (respectively ite) is the 
induced marginal distribution on Q (respectively E). We call us the marginal BIS 
distribution on E. It is shown in [H Lemma 10] that tt^j is also the distribution 
induced on U by a uniform random independent set in G, justifying the name. 
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In [I], -kq (A), for A G i? is defined in terms of the rank of A, viewed as a bipartite 
adjacency matrix over F 2 ; this definition is equivalent to the one given here. 

The GS-process is an ergodic "single bond flip" Markov chain on state space f2 
which has stationary distribution ttq. The exact definition of this Markov chain is 
not important to us, as our counterexample applies to any Markov chain on state 
space Q with stationary distribution txq that does not change too many edges in 
one step. In order to formalise this last requirement, say that a Markov chain with 
transition probabilities P : L? 2 — > [0, 1], is d-cautious if 

P(A, A') > \A® A'\ < d, for all A, A' G Q. 

The GS Process is a 1-cautious Markov chain. Our negative result applies to all 
(i-cautious chains, where d depends at most linearly on the number of vertices of G. 

3. A COUNTEREXAMPLE TO RAPID MIXING 

The following lemma (taken from [2], Claim 2.3]) packages the conductance ar- 
gument in a convenient way for us to obtain explicit lower bounds on mixing time. 

Lemma 1. Let (X t ) be a Markov chain with state space Q, transition matrix P 
and stationary distribution ir. Let {S, T} be a partition of Q such that ir(S) < \, 
and C C fl be a set of states that form a "barrier" in the sense that P(s, t) = 
whenever s G S \C and t G T \ C . Then the mixing time of the Markov chain is 
at least 7c(S)/8tt(C). 

Let n,m be positive integers such that (3/2) m < 2™ - 1 < (3/2) m+1 . Note that 
for every n there is a unique m satisfying the inequalities, and that m depends 
linearly on n, asymptotically. The counterexample graph (actually sequence of 
graphs indexed by n) G n = (U' U V U U", E) has vertex set U' U V U U" where 
\U'\ = n and [ V| = \U"\ = m. The edge set is E = U' x V U M, where M is a 
perfect matching of the vertices in V and U" . Thus, (a) G n has bipartition (U, V) 
where U = U' U U", (b) U', V and U" are all independent sets, (c) the edges 
between U' and V form a complete graph, and (d) the edges between V and U" 
form a matching. 

Partition E as E = E UEi, where E = {I G E \ In U' = 0} and Ex = E\E . 
Observe there are 3 m independent sets in G n that exclude all vertices in U', and 
(2 n — 1)2™ that include some vertex. Since 7r^ is the marginal distribution of 
independent sets in G n , 

TT^iEn) = ; r = a, 

where | < a < |, by choice of n, m. So Eq U E\ is a nearly balanced partition of 
the state-space E. Also it is easy to check that the cut defined by this partition 
is a witness to the conductance of the "single site flip" Markov chain of £Q] being 
exponentially small in n. This implies that the mixing time of the single site flip 
Markov chain is exponential in n (which, of course, was never in doubt). Next we 
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identify a partition f2 U£2i that mirrors the partition EqUEi, and itself witnesses 
exponentially small conductance of the GS Process. 

Define the weight w(A) of A G Q to be u;(v4) = \A D M|. Partition J? as 
,f? = i?o U where f2 = {A <E f2 \ w(A) < j^m} and fl\ = Q \ fi Q . We aim 
to show that the weights of states in i? are concentrated around |m and |m, and 
there are exponentially few states near the boundary of i?o an d With a view 
to applying Lemma [TJ define a "barrier set" (of states) by 

C = {A G Q | < w(A) < gm}. 

It is not clear how to sample a state A from the distribution {Q,ttq) directly, 
so instead we sample a state I from (E, tt^) and then sample u.a.r. a state A 
consistent with /, i.e., satisfying ^(/, A). This amounts to the same thing. 

Suppose we start with a state / sampled from (£, tt^), conditional on / G Eq. 
The set IDU" is determined by a Bernoulli process with success probability |. (For 
each edge e in M there are three possibilities for the restriction of the independent 
set I to e, and only one of them includes a vertex from [/". These choices are 
independent for each e G M.) When we come to select a random consistent edge 
set A, we must exclude all edges in M that are incident to a vertex in / fl U" . 
The other edges in M are free to be included or excluded. So the set of edges 
A n M is determined by a Bernoulli process with success probability |. Thus 
E(w(t4)) = |m and, by a Chernoff bound, Pr(w(y4) > ^m) is exponentially small 
in m. Specifically, 

(1) Pr(A G C) < Pr (w(S) > ^m) < exp(-m/576) 

by [TJ Thm 4.4(2)], setting 8 = | and = |m. 

Now suppose / is sampled from (Z 1 , 7?^), conditional on / G U\. Now select a 
uniform random A, conditional on the event \{1 \ A). We argue that the probability 
that a given edge e = {v , u) of M is included in A is |, independent of all the other 
edges of M. Suppose v G V and iy,u) G M. Imagine we are deciding which edges 
incident to v are to be included in A. First we decide whether to include the edge 
(v,u) itself. In selecting the remaining edges for A from the n available, we just 
have to make sure that the parity of A fl ({v} x (/ fi U')) is odd, if {y , u) G A and 
u G /, and even otherwise. Since IC\U' ^ 0, the number of ways to do this is 2 n_1 , 
independent of whether we included edge e in the first place. It follows that the 
set of edges A fl M is determined by a Bernoulli process with success probability 
|. Thus E(w(A)) = |m and, by Chernoff, Pr(w(A) < %rn) is exponentially small 
in n. Specifically, 

(2) Pi(A EC) < Pi (w{S) < gm) < exp(-m/576) 

by pU Thm 4.5(2)], setting 8 = ^ and /i = |m. 

We see now that the partition Q Q UQi = Q is balanced, since ^(i^o) = a±o(l) 
and | < a < \. Moreover, from (CQ) and (T5]), Pr(^ < w(A) < g) is exponentially 
small when A is selected from the distribution (i?,^); specifically, hq{C) < 
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exp(— m/576). Thus the cut (i? , ^1) is witness to the conductance of the single 
bond flip MC being exponentially small. Suppose d < m/12. Observe that no 
(i-cautious chain can make a transition from f2 \ C to fi\ \ C . Applying Lemma [U 
we therefore obtain. 

Theorem 2. Suppose that n, m, G n , Q and tiq are as above, and that d < m/12. 
Any ergodic Markov chain on state space Q with stationary distribution t\q that is 
d-cautious has mixing time f2(exp (m/576)). In particular, the GS Process, which 
is 1-cautious, has mixing time exponential in the number of vertices in G n . 

It is also natural to consider a "Swendsen- Wang-style" Markov chain for sam- 
pling from (E, tcjj)- Let / G 17 be the current state. Choose A u.a.r. from the set 
{A e Q | x(/,A)}. Then choose V u.a.r. from the set {V e E \ x(I',A)}. The 
new state is We can think of this process as a Markov chain on state space 
EUf2 with stationary distribution on ^ and on Q. (Assume a continuous 
time process to avoid the obvious periodicity.) It follows from the earlier analysis 
that the cut (E U L? , E x U Q\) witnesses exponentially small conductance. To 
see this, we calculate the probability in stationarity of observing a transition from 
E U L? to Ei U i?x- There are two possibilities: a transition from E to or 
one from L? to E\. The probability of the former, we have seen, is ^ttjj(Eq) times 
a quantity that is exponentially small in n. The latter is, by time reversibility, the 
same as observing, in stationarity, a transition from E\ to i?o- This probability is 
again exponentially small in n. Hence the conductance is exponentially small so 
the mixing time of the Swendsen- Wang-style Markov chain is exponential in n. 

We can also look a little closer, to see what is going on in more detail. Sample 
a state at random from (E, 7Te), conditioned on the event -n^ G an d apply 
a "half-step" of the SW-like process to obtain a state A G Q. We know that 
A H M is described by a Bernoulli process with success probability |. Moreover, 
it is easy to see the remaining edges of A are Bernoulli with success probability \. 
Now consider the transition from A to As in [4J, view the set V PI U' as a 
n- vector u' over F 2 . Each of the vertices in V that is not incident to an edge of 
AdM generates a linear equation, with constant term zero, constraining u' . These 
|m w 1.1397n random linear equations constrain just n variables; so with with 
high probability the only solution is to set all n variables to 0. ( Equivalent ly, a 
random n x |m matrix over F 2 has rank n with high probability.) In other words, 
I' H U' = 0, except with exponentially small probability, and we find ourselves 
back in E again. 
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